How to Read in Data From .dat Pandas

March 03, 2022 Post a Comment

                        In [1]:                        import            pandas            as            pd

Titanic data
This tutorial uses the Titanic information set, stored as CSV. The information consists of the post-obit information columns:
- PassengerId: Id of every rider.
- Survived: This characteristic have value 0 and 1. 0 for non survived and 1 for survived.
- Pclass: There are 3 classes: Grade i, Grade two and Class 3.
- Name: Name of passenger.
- Sex: Gender of passenger.
- Age: Age of rider.
- SibSp: Indication that passenger take siblings and spouse.
- Parch: Whether a rider is lonely or have family.
- Ticket: Ticket number of passenger.
- Fare: Indicating the fare.
- Cabin: The cabin of passenger.
- Embarked: The embarked category.
To raw information

How do I read and write tabular data?¶

I want to analyze the Titanic rider data, available every bit a CSV file.

                                    In [2]:                                    titanic                  =                  pd                  .                  read_csv                  (                  "data/titanic.csv"                  )

pandas provides the read_csv() function to read data stored as a csv file into a pandas DataFrame . pandas supports many unlike file formats or data sources out of the box (csv, excel, sql, json, parquet, …), each of them with the prefix read_* .

Brand sure to always take a check on the data after reading in the data. When displaying a DataFrame , the first and last v rows volition be shown by default:

                            In [three]:                            titanic              Out[3]:                                                          PassengerId  Survived  Pclass                                               Proper noun  ...            Ticket     Fare  Motel  Embarked              0              1         0       3                            Braund, Mr. Owen Harris  ...         A/5 21171   seven.2500    NaN         S              i              2         1       1  Cumings, Mrs. John Bradley (Florence Briggs Th...  ...          PC 17599  71.2833    C85         C              2              3         ane       3                             Heikkinen, Miss. Laina  ...  STON/O2. 3101282   7.9250    NaN         S              3              4         1       1       Futrelle, Mrs. Jacques Heath (Lily May Pare)  ...            113803  53.1000   C123         Due south              4              5         0       three                           Allen, Mr. William Henry  ...            373450   8.0500    NaN         S              ..           ...       ...     ...                                                ...  ...               ...      ...    ...       ...              886          887         0       2                              Montvila, Rev. Juozas  ...            211536  13.0000    NaN         S              887          888         1       one                       Graham, Miss. Margaret Edith  ...            112053  30.0000    B42         S              888          889         0       iii           Johnston, Miss. Catherine Helen "Carrie"  ...        W./C. 6607  23.4500    NaN         S              889          890         1       1                              Behr, Mr. Karl Howell  ...            111369  30.0000   C148         C              890          891         0       3                                Dooley, Mr. Patrick  ...            370376   7.7500    NaN         Q              [891 rows x 12 columns]

I want to run across the first 8 rows of a pandas DataFrame.

                                    In [4]:                                    titanic                  .                  head                  (                  8                  )                  Out[4]:                                                                          PassengerId  Survived  Pclass                                               Proper name  ...            Ticket     Fare  Cabin  Embarked                  0            1         0       three                            Braund, Mr. Owen Harris  ...         A/5 21171   seven.2500    NaN         South                  1            2         1       one  Cumings, Mrs. John Bradley (Florence Briggs Th...  ...          PC 17599  71.2833    C85         C                  ii            iii         1       three                             Heikkinen, Miss. Laina  ...  STON/O2. 3101282   vii.9250    NaN         S                  iii            iv         i       one       Futrelle, Mrs. Jacques Heath (Lily May Pare)  ...            113803  53.1000   C123         S                  4            5         0       three                           Allen, Mr. William Henry  ...            373450   8.0500    NaN         South                  5            vi         0       iii                                   Moran, Mr. James  ...            330877   8.4583    NaN         Q                  6            7         0       1                            McCarthy, Mr. Timothy J  ...             17463  51.8625    E46         S                  7            8         0       3                     Palsson, Master. Gosta Leonard  ...            349909  21.0750    NaN         South                  [8 rows x 12 columns]

To meet the first Due north rows of a DataFrame , use the head() method with the required number of rows (in this case 8) as argument.

Annotation

Interested in the last North rows instead? pandas also provides a tail() method. For example, titanic.tail(10) will return the concluding x rows of the DataFrame.

A check on how pandas interpreted each of the column data types tin can be washed past requesting the pandas dtypes attribute:

                            In [5]:                            titanic              .              dtypes              Out[5]:                                          PassengerId      int64              Survived         int64              Pclass           int64              Proper noun            object              Sex             object              Age            float64              SibSp            int64              Parch            int64              Ticket          object              Fare           float64              Cabin           object              Embarked        object              dtype: object

For each of the columns, the used data type is enlisted. The data types in this DataFrame are integers ( int64 ), floats ( float64 ) and strings ( object ).

Note

When asking for the dtypes , no brackets are used! dtypes is an aspect of a DataFrame and Series . Attributes of DataFrame or Series do not need brackets. Attributes represent a characteristic of a DataFrame / Series , whereas a method (which requires brackets) do something with the DataFrame / Series as introduced in the first tutorial.

My colleague requested the Titanic information as a spreadsheet.

                                    In [6]:                                    titanic                  .                  to_excel                  (                  "titanic.xlsx"                  ,                  sheet_name                  =                  "passengers"                  ,                  index                  =                  False                  )

Whereas read_* functions are used to read data to pandas, the to_* methods are used to store information. The to_excel() method stores the data equally an excel file. In the example here, the sheet_name is named passengers instead of the default Sheet1. By setting index=Imitation the row index labels are non saved in the spreadsheet.

The equivalent read function read_excel() will reload the data to a DataFrame :

                            In [seven]:                            titanic              =              pd              .              read_excel              (              "titanic.xlsx"              ,              sheet_name              =              "passengers"              )

                            In [8]:                            titanic              .              head              ()              Out[viii]:                                                          PassengerId  Survived  Pclass                                               Name  ...            Ticket     Fare  Motel  Embarked              0            i         0       three                            Braund, Mr. Owen Harris  ...         A/5 21171   7.2500    NaN         Due south              one            2         i       i  Cumings, Mrs. John Bradley (Florence Briggs Thursday...  ...          PC 17599  71.2833    C85         C              2            3         1       iii                             Heikkinen, Miss. Laina  ...  STON/O2. 3101282   7.9250    NaN         Southward              3            iv         ane       1       Futrelle, Mrs. Jacques Heath (Lily May Peel)  ...            113803  53.thousand   C123         South              4            5         0       iii                           Allen, Mr. William Henry  ...            373450   8.0500    NaN         S              [5 rows x 12 columns]

I'thou interested in a technical summary of a DataFrame

                                    In [9]:                                    titanic                  .                  info                  ()                  <form 'pandas.cadre.frame.DataFrame'>                  RangeIndex: 891 entries, 0 to 890                  Data columns (total 12 columns):                                      #   Column       Non-Nix Count  Dtype                                    ---  ------       --------------  -----                                                        0   PassengerId  891 non-goose egg    int64                                                        one   Survived     891 non-cipher    int64                                                        2   Pclass       891 not-naught    int64                                                        3   Name         891 non-null    object                                                        4   Sex          891 non-null    object                                                        5   Historic period          714 non-nix    float64                                      6   SibSp        891 non-null    int64                                                        7   Parch        891 non-null    int64                                                        eight   Ticket       891 non-null    object                                                        9   Fare         891 non-nix    float64                                      10  Motel        204 non-naught    object                                                        11  Embarked     889 non-null    object                                    dtypes: float64(two), int64(5), object(five)                  memory usage: 83.7+ KB

The method info() provides technical information about a DataFrame , so let'southward explain the output in more detail:

Information technology is indeed a DataFrame .
There are 891 entries, i.east. 891 rows.
Each row has a row label (aka the index ) with values ranging from 0 to 890.
The tabular array has 12 columns. Almost columns accept a value for each of the rows (all 891 values are non-null ). Some columns do accept missing values and less than 891 non-null values.
The columns Name , Sex , Cabin and Embarked consists of textual data (strings, aka object ). The other columns are numerical data with some of them whole numbers (aka integer ) and others are real numbers (aka float ).
The kind of data (characters, integers,…) in the different columns are summarized by listing the dtypes .
The approximate corporeality of RAM used to hold the DataFrame is provided as well.

REMEMBER

Getting data in to pandas from many different file formats or data sources is supported by read_* functions.
Exporting data out of pandas is provided by different to_* methods.
The caput / tail / info methods and the dtypes attribute are user-friendly for a first check.

To user guide

For a complete overview of the input and output possibilities from and to pandas, run across the user guide section about reader and writer functions.

cofieldpreclaid.blogspot.com

Source: https://pandas.pydata.org/docs/getting_started/intro_tutorials/02_read_write.html

Cofield Preclaid

How to Read in Data From .dat Pandas

How do I read and write tabular data?¶

REMEMBER

Post a Comment for "How to Read in Data From .dat Pandas"